Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora

نویسندگان

Zheng-Yu Niu

Dong-Hong Ji

Chew Lim Tan

چکیده

Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed data (tagged and untagged corpora). This algorithm tries to obtain a natural partition of the mixed data by maximizing a stability criterion defined on the classification result from an extended label propagation algorithm over all the possible values of the number of senses (or sense number, model order). Experimental results on SENSEVAL-3 data indicate that it outperforms SVM, a one-class partially supervised classification algorithm, and a clustering based model order identification algorithm when the tagged data is incomplete.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagg...

متن کامل

A Bayesian Approach to Semi-Supervised Learning

Recent research in automated learning has focused on algorithms that learn from a combination of tagged and untagged data. Such algorithms can be referred to as semi-supervised in contrast to unsupervised, which refers to algorithms requiring no tagged data whatsoever. This paper presents a Bayesian approach to semi-supervised learning. In this approach, the parameters of a probability model ar...

متن کامل

Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study

A central problem of word sense disambiguation (WSD) is the lack of manually sense-tagged data required for supervised learning. In this paper, we evaluate an approach to automatically acquire sensetagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method ...

متن کامل

Towards A Hybrid Approach To Word-Sense Disambiguation In Machine Translation

The task of word sense disambiguation aims to select the correct sense of a polysemous word in a given context. When applied to machine translation, the correct translation in the target language must be selected for a polysemous lexical item in the source language. In this paper, we present work in progress on a supervised WSD system with a hybrid approach: on the one hand it relies on supervi...

متن کامل

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to trai...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora

نویسندگان

چکیده

منابع مشابه

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

A Bayesian Approach to Semi-Supervised Learning

Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study

Towards A Hybrid Approach To Word-Sense Disambiguation In Machine Translation

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

عنوان ژورنال:

اشتراک گذاری